[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

naromero77amd · 2025-11-12T22:42:17Z

In the ROCm fork of PyTorch 2.7, Inductor currently has codegen support for fast_tanhf. However, it is currently guarded by TORCHINDUCTOR_USE_FAST_MATH environment variable due to some NaN issues in the original Triton implementation of fast_tanhf.

Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments).

Thus, I have removed the conditionalization on Triton versions as well. A bump in the Triton commit is also needed.

Other notes:

In support of SWDEV-560271
Triton 3.3 backport of upstream Triton commit [AMD] reimplement fast_tanhf() to avoid overflow (#8551) triton#902
Similar to [release/2.8][ROCm][inductor] Improved fast_tanh code generation #2803, [release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804
Related to [ROCm][inductor] Codegen support for fast_tanhf pytorch/pytorch#162052

rocm-repo-management-api · 2025-11-12T22:47:38Z

Jenkins build for 1b1fde5fcc342c2c0d3c69bf95a91501fc39b324 commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

naromero77amd · 2025-11-12T22:58:35Z

I have confirmed that it resolves the reproducer in the Jira.

jataylo · 2025-11-14T16:46:57Z

torch/_inductor/codegen/triton.py

-                return f"libdevice.tanh({x})"
+        # On ROCm, always use fast_tanhf
+        # Requires ROCm fork of Triton 3.3, 3.4, 3.5 or upstream Triton 3.6+
+        if torch.version.hip:


2.7 uses 3.3 IIUC

We should at least support 3.2 in 2.7, so lets conditionalise on triton > (3,3) if 3.3 supports this.

rocm-repo-management-api · 2025-11-14T20:47:15Z

Jenkins build for f416c7119ad1443bf022a37a8f3f21b201aa4bbc commit finished as FAILURE
Links: Blue Ocean view / Build artifacts

In the ROCm fork of PyTorch 2.8, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf . Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.4 backport of upstream Triton commit ROCm/triton#900 - Similar to #2802, #2804 - Related to pytorch#162052

In the ROCm fork of PyTorch 2.9, Inductor currently has codegen support for fast_tanhf. However, there were some NaN issues in the original Triton implementation of fast_tanhf . Upstream Triton has an improved fast_tanhf where the NaN issues are now fixed. This upstream commit has been backported to ROCm fork of Triton (see code comments). A bump in the Triton commit is also needed. Other notes: - In support of [SWDEV-560271](https://ontrack-internal.amd.com/browse/SWDEV-560271) - Triton 3.5 backport of upstream Triton commit ROCm/triton#901 - Similar to #2802, #2803 - Related to pytorch#162052

naromero77amd added 2 commits November 12, 2025 22:07

On ROCm, always use fast_tanhf for triton codegen.

7c5277f

Pump up Triton commit to support fast_tanhf.

1b1fde5

naromero77amd requested review from jataylo, jeffdaily, jithunnair-amd and pruthvistony as code owners November 12, 2025 22:42

This was referenced Nov 13, 2025

[release/2.8][ROCm][inductor] Improved fast_tanh code generation #2803

Merged

[release/2.9][ROCm][inductor] Improved fast_tanh code generation #2804

Merged

jataylo reviewed Nov 14, 2025

View reviewed changes

Conditionalize fast_tanhf on triton_version.

f416c71

pruthvistony approved these changes Nov 17, 2025

View reviewed changes

pruthvistony merged commit 9dc9120 into release/2.7 Nov 17, 2025
0 of 2 checks passed

pruthvistony deleted the release_/2.7_new_fast_tanh branch November 17, 2025 18:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

Uh oh!

naromero77amd commented Nov 12, 2025 •

edited

Loading

Uh oh!

rocm-repo-management-api bot commented Nov 12, 2025 •

edited

Loading

Uh oh!

naromero77amd commented Nov 12, 2025

Uh oh!

jataylo Nov 14, 2025

Uh oh!

rocm-repo-management-api bot commented Nov 14, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

[release/2.7][ROCm][inductor] Improved fast_tanh code generation #2802

Uh oh!

Conversation

naromero77amd commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocm-repo-management-api bot commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

naromero77amd commented Nov 12, 2025

Uh oh!

jataylo Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

rocm-repo-management-api bot commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

naromero77amd commented Nov 12, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 12, 2025 •

edited

Loading

rocm-repo-management-api bot commented Nov 14, 2025 •

edited

Loading